LIBRARY SCHOOL BEY, CALIFORNIA 93943-5003 )NTBB OHTEJ L POSTGRADUATE SCHOOL Monterey, California I titL SPEECH RECOGNITION IN A COMMAND AND CONTROL WORKSTATION ENVIRONMENT by Michael A. LeFever March 1987 Thesis Advisor Gary K. Poock Approved for public release; distribution is unlimited. T231306 UNCLASSIFIED SEC'jai ClaSS. c CaTOn 0' T^S paG: REPORT DOCUMENTATION PAGE U REPORT SECURITY CLASSif'CATiON UNCLASSIFIED b RESTRICTIVE MARKINGS la SECl>R'Tv Classification authority :D OEClASS-? CAT. ON - DOWNGRADING SCHEDULE 3 D:STRiauTiON/ AVAiLAS'LiTY OF REPORT APPROVED FOR PUBLIC RELEASE; DISTRIBUTION IS UNLIMITED PERFORMING ORGANIZATION REPORT \uM3ER(S) 5 MONiTOR. NG ORGANIZATION REPORT NUMBER'S) ca NAME Cc PERFORM. NG ORGAN. ZAT ON iaval Postaraduate School to O^'.CE S ' M 8 O i (!f applicable) 55 a NAME OF MON.TORAG ORGA.VZAT O Naval Postgraduate School Sc ADDRESS .City. State, and Z'.PCode) Monterey, California 93943-5000 7b ADDRESS (Cry. Stare, and ZIP Code) Monterey, California 93943-5000 IA.VE OF FuNOiNG SPONSOR:N< IRGAN ZAT ON So DFFiCc SYMBOl f/f applicable) :nt nst«u' 3c AODRESSfGry 5r Ccxi*; iO'JRCE OF funOin; PROGRAM Element no PRO.-EC NO 'AS< NO ACCESSC- NO M * Tl£ (include Security Clarification) SPEECH RECOGNITION IN A COMMAND AND CONTROL WORKSTATION ENVIRONMENT -ERSO'-V aut-or;s> .EFEVER, MICHAEL A •J 'V?t Z' *£PO«T Master's Thesis V. E C C '. E 3 E 3 ► aov ?A*E OF =E.PQRT {Year Month Day, March 1587 75 6 5LPpLEV£NTARY NO'ATiQN cosa: codes GROUP SUB-GROUP '8 SUBJECT TERMS .Continue on reverse if necessary and identify by O'cc* number, SPEECH RECOGNITOR COMMAND AND CONTROL WORKSTATION, CCWS, SRI "BERKELEY1 SPEECH BOARD, VOTAN SPEECH RECOGNITION 9 A8SrRACT 'Continue on reverse if necessary and identify by blo 1000 words). Real-time response. Very high recognition accuracy ( > 98%). Adaptable to the user, (i.e., the user should not have to modify or alter his speaking rate significantly) • No deterioration in accuracy in noisy and stressful environments. These specifications are believed by the author to be those items necessary for an effective and viable speech recognition system. The minimum capacity of 1000 words was specified since this was a previous goal set in 1971 by the Department of Defense. (Barr and Feigenbaum, 1981) An accurate, versatile, and fast large vocabulary system which adapts readily to any user should be the goal oi" all manufacturers of automatic speech recognizers. Consequently, this list will be the criteria for final evaluation of the SRI 1000 word discrete recognizer and the VOTAN continuous word recognizer. Since each speech recognizer is different, it is crucial that those responsible for the man-machine interface spend sufficient resources in defining the requirements of a particular system and finding the correct speech system to match. G. CONCLUSION The sole purpose of a command and control system is to support the commander's decision process. The current system (NTDS) is overwhelmed by the amount of information it must process and is proliferated with ad hoc equipments that 20 were never really designed to be interfaced with this system. An inadequate system exists for today's commander. A systems approach utilizing the technological advances in distributed networks and personal computing led to the development of DCS and CCWS. The workstation in development will incorporate the latest in protocols and will focus on supporting the operational commander. The system design is to take full advantage of the man- machine interfaces. Since our fastest and most efficient means of communication is speech, it is only justifiable, that the design of the CCWS should consider speech input output interfaces. This will ensure that the architecture for the command and control workstation is designed to be a true extension of the commander. 21 III. SPEECH TECHNOLOGY PAST, PRESENT AND FUTURE A. OVERVIEW This chapter will describe the basic types of speech recognition systems and a few of the fundamental terms associated with these systems. The history of speech input output systems and forecasts of the future of speech technology are discussed in broad detail. It is important to realize that each automatic speech recognizer uses different algorithms. The user must be thoroughly familiar with the particular system to ensure that it is the correct equipment for the task and that proper training and programming of the system has been achieved. A basic familiarity with the terms and the types of speech recognition systems is essential in comprehending this rapidly growing technological field. B. DEFLNITION OF TERMS Before discussing speech recognition systems, we need to define and discuss the various generic types of speech systems. As shown in Figure 2.1 there are two major types: speaker dependent and speaker independent. A speaker dependent systems relies entirely on the user training the speech recognition system. The user speaks an utterance (one or more words in a phrase) usually 1-5 times for each word or a particular output string. The equipment translates the frequency vs. time output into a normalized, digital matrix. Depending on the manufacturer, these may be manipulated by some averaging algorithm or just stored as separate templates in memory or in a data base. A template is the digital representation or matrix of the utterance which is used by the device to compare against your spoken word. Each system uses different algorithms to calculate the template and a thorough understanding of the algorithm used by the device is required to maximize recognition through proper training. When a particular utterance is spoken, it is compared against the template in memory and if it is within a pre-established limit or threshold, the device performs the function the user has installed on the system. If it does not meet the threshold level, the utterance is rejected and nothing is sent by the recognizer. Additionally, there are two other events which can occur: an insertion or a substitution error. An insertion occurs when a recognition takes place due to spurious noise or an utterance other than those that are legitimate entries in the data base. For example, if you said 'defcon' or 22 SPEECH SYSTEMS SPEAKER DEPENDENT SPEAKER INDEPENDENT DISCRETE CONTINUOUS DISCRETE CONTINUOUS CONNECTED CONNECTED Figure 2.1 Automatic Speech Recognition Systems. a similar word NOT in your database and the system recognizes and outputs the string for 'defense'. A substitution on the other hand occurs when your input utterance is calculated as a closer match to a different template in storage, thus incorrectly recognizing another word. Tor example, if 'defcon' and 'defense' ARE currently in the database and the utterance 'defcon' produces the string 'defense'. (Pallett, 1985) The speaker-dependent, template matching systems are the most common systems on the market. A system trained to a particular individual can achieve recognition accuracies of 90-99 percentile. On the other hand, a speaker independent system contains algorithms which are robust enough for any individual to be correctly recognized. Such a device requires no training since each word is represented by templates which are an average of a wide range of different utterances selected by the manufacturer. Depending on the size and limitations of the vocabulary, recognition accuracies are slightly less than those experienced by the speaker dependent systems. 23 The goal of most speech recognition manufacturers and researchers is to develop a large vocabulary recognizer which is independent of the user. (Poock, 1986b) Each of these two categories is further subdivided into three separate categories: discrete, connected, and continuous. A discrete system or isolated word system as its name implies is one in which the user must pause for a predetermined time (about .1 sec) between consecutive utterances. The device establishes the start and endpoint of the word. These utterances are compared to what is in memory and the output string is sent once the recognizer has calculated the best match. The connected speech system requires no pauses between utterances. The system is continually checking what is spoken and what is in memory. As the word or phrase is recognized, the device is loading that particular string into the output buffer. Once the user pauses, the system unloads all that it has accumulated in the butter. In contrast, the continuous system outputs the prescribed string immediately upon recognition and does not wait for a pause from the user. Even though there appear to be no apparent word boundaries, the device is able to calculate matches and produce the output strings. This is much harder than discrete recognition since there are major changes which occur in the pronunciation of words at the word boundaries known as co articulation. These are differences in speech patterns not found in isolated or discrete word pronunciation. Manufacturers today are still not in agreement over exactly what constitutes the difference between these last two types. As stated earlier each and every system is different and must be thoroughly tested and analyzed to ascertain exactly what the manufacturer is trying to represent in his literature. C. PAST Many of the larger technical companies like IBM. Philco-Ford, RCA, and Bell Telephone Laboratories started research back in the early 50's and 60's. It was not until the early 70's that the first products commercially available were offered by Threshold Technology, Inc. and Scope Electronics. (Poock. 19S6b) Concurrently in the early 1970's, the U. S. Department of Defense Advanced Research Projects Agency (ARPA) funded a five-year program in speech understanding research (SUR). ARPA funded five speech projects and several subcontracts for developing parts of speech-systems. Some q{ the major ARPA contractors produced multiple systems during the five-year period: Work at Bolt. Beranek and Newman, Inc. 24 (BBN) produced first SPEECHLIS and then HWIM (Hear What I Mean), building on earlier BBN research on understanding natural language. Carnegie- Mellon' Universitv (C.M.U.) produced the HEAKSEY-I and URAVOX svstems in the earlv development phase (1971-1973) and the HARPY and HEARSEY-II programs bv 1976. SRI International also developed a speech understanding program, pa'rtlv in collaboration with Svstems Development Corporation (SDCk (Barr and Feigenbaum, 19S1) The ARPA projects were all built for the purpose of developing a speech understanding device, but they varied considerably in levels of difficulty, number of speakers, ambient noise, etc. As a result of this effort there was considerable progress made toward practical speech-understanding systems. One of the most important ideas to surface from these projects was the influence of Artificial Intelligence {AT) research and system architecture. The researchers found phonetic recognition was the most promising answer to continuous speech understanding, but at the time they did not have the computing power necessary nor was it as straight forward as initially anticipated. Since the early success of speech recognition used template matching, industry abandoned the harder track of speech phonetics. D. PRESENT 1. Overview Currently there are literally thousands of organizations in the United States and around the world exploiting speech systems. From controlling robot arms on the space shuttle to incorporation into children's toys, speech input output systems are in daily use and are growing rapidly. Despite ARPA's efforts, up until now all the speech systems have consisted of relatively small quantity vocabulary7 pattern matching or template matching techniques. The better systems can be expected to have recognition accuracies of better than 97%. There are several periodicals like the Journal of The American Voice I 0 Society and Speech Technology Man' Machine Voice Communications which reflect the latest in research, applications of speech processing, and product reviews. In fact in a recent edition, there were 193 different companies listed providing various products and or services in the speech field. Speech recognition today is extremely capable and reliable and could be applied to thousands of areas with more awareness and understanding of its benefits to both user and management. 25 2. Speech Applications in Command and Control Application of speech recognition systems in a shipboard environment need not stop with the CCWS. There are many other areas where using this technology could be beneficial. In the Combat Direction Center, manipulating NTDS displays and functions on these consoles by voice in conjunction with the trackball tab. computer controlled action entry panel (CCAEP), digital data entry unit (DDEU), and category select panel would allow users to more quickly disseminate information and result in less operator fatigue. Data retrieval by the Commanding Officer or Tactical Action Officer to display decision aids or threat matrices by voice could promote better weapon or countermeasures selections. The automatic speech recognizer could allow the commander to focus totally on the display. Combat Direction Center is not the only area on the ship that could benefit from speech recognition systems. A voice activated expert system for controlling engineering propulsion plant casualties would greatly enhance the reduced manning policy on the automated gas turbine powered ship classes. Remote activation of damage control (DC) or firefighting equipment by personnel outside the damaged space could reduce the risk of damage to sailors and equipment. The list could continue. Salfer (1985) presents a more detailed analysis of applications of ASR systems onboard the FFG-7 class ships which could be expanded to include other classes of ships as well. The underlying reason for pointing out various other areas for speech applications is to stimulate awareness and generate other ideas for applications for this technology. It is important to note regardless of how much faster or better a system can work employing automatic speech recognition technology, if the user and management do not have the motivation to examine such a system, this equipment like others would have no hope for success. E. FUTURE Speech recognition in no way should be considered stagnant. Manufacturers and corporations are more than ever wanting to reap the benefits of this technological field. As the awareness and knowledge of this technology becomes more widespread especially in man-machine interface, a greater proliferation of systems will be seen. The new horizon for speech recognition systems is to move away from template matching schemes to the more flexible phonetic recognition. The basis of phonetic 26 systems is phonemes the basic units of all speech. Once the system is trained on words utilizing all the combinations of phonemes, the formulation of any word is possible. For example this phrase, taken from Speech Systems Incorporated advertising literature. continuous speech development toolkit would look like this phonetically. kantinyuasspichdivelapmentulk.it. The phonemes are then converted by different syntactic and dictionary builders in a computer which produce the correctly formulated string. At the 1986 American Voice Input; Output Society (AVIOS) convention, there was only one vendor Speech Systems Incorporated who was marketing a phonetic recognition system. It is the first commercial system oi" its type. It is surely the trend of future speech recognition understanding systems and it is one focus of the Department of Defense funding. In addition to industrial and university research, Defense Advanced Research Projects Agency (DARPA. formerly ARPA). is sponsoring another multi-million dollar contract titled Strategic Computing Program. A major part of the Strategic Computing Program is the integration, transition, and performance evaluation of speech technology. "The speech recognition portion of the Strategic Computing Program is divided into two major areas: continuous speech recognition and robust, connected- word recognition . . ." (Strategic Computing, 1985). The aim of this program is to make continuous speech recognition a realization. The major thrust would be in the area of phonetic recognition to deal with speaker variation, large vocabularies, natural grammars, and real time response. In the area of robust speech recognition, the objectives are to improve upon current system's capacity to deal with variations and distortions of the input speech signal in severe acoustic noise and physiological psychological stress found in military applications. (Strategic Computing Program, 1985) Increased use of computers in problem solving will demand more emphasis on man-machine interfaces. Speech recognition will be that interface which makes the computer a true extension of man. We communicate with each other by speech, so it should only be expected we can do the same via a computer. This cursory look at speech types and speech related terminology is meant only to familiarize the reader 27 with terms to be used later and to introduce the ever broadening future of speech input/ output systems. 28 IV. TEST, ANALYZE, AND EVALUATE THE SRI BERKELEY' SPEECH BOARD This chapter describes a series of tests whose purpose was to confirm the voice recognition performance of the SRI 'Berkeley' board as reported in Murveit (1986). The results of the SRI study suggest that a 1000-word discrete speech recognition system does not sacrifice accuracy despite the high processing speeds necessary for large vocabulary recognition. Their report indicates that tiie Berkeley speech board system achieved a recognition accuracy of over 90 percent for a 1000 word vocabulary and over 99 percent for a sixteen word vocabulary. In addition this chapter will examine the algorithms used by the speech board for initial template creation, voice recognition, and error correction. A. DESCRIPTION SRI selected the 'Berkeley' board because it was the state of the art in large vocabulary speech recognition. A recognizer of this type was a necessary requirement in a CCWS for a faster and more natural man-machine interface in command entry and database access. Specifically, the research conducted by SRI was for the enhancement of speech interfaces for natural-language data-base-management tools. In cooperation with U.C. Berkeley. SRI modified the design slightly and interfaced it to the SUN- 170 Microsystems computer. B. THE SUN- 170 MICROSYSTEMS WORKSTATION The SUN- 170 Microsystem workstation is a UNIX based computer system. These workstations are used in a variety of applications. The value of workstations was realized with the increase in computer power provided by the development of 16 and 32 bit microprocessors. A typical workstation will generally consist of a 1 MIPS (million instructions per second) CPU, 2-4 Megabytes of memory, a high resolution (1000 by 1000 pixels) display, a keyboard, and a mouse. The speech board is interfaced to the SUN and receives the audio input directly. The workstation used in this experiment is the host computer on the Department of Defense Network (DDN) at address SRI-BOZO. There are several inherent attributes like file transfer protocol (FTP) and telenet (TN) resident on the DDN network which allowed remote work on the vocabulary and data processing from the Naval Postgraduate School. 29 C. MARA MARA is the hardware and software components that integrate the speech recognizer into the workstation. The MARA system consists of: • the computer and its programs • the speech recognizer • the user The MARA hardware consists of a Multibus PC board, a backplane with a connector, a BXC cable, a pre-amplifier, and a microphone. The software components include: • The PC board pvogram-maraS6.com • The MAM Daemon-mara • The Low Level Recognition command lihY^ry-libmara.a ■ The Standard Mbrdiry -libmara.a • Support libraries for various applications-libmarawindow.a The MARA system in the broadest sense is the combination of equipment and programs that are referred to as the SRI 'Berkeley' board. (Kavaler, 1986) D. THE SRI BERKELEY' BOARD The speech recognition board, as its name implies, is a single circuit board. This board is built with a multibus interface and is modified to be inserted directly into the SUN Microsystems computer workstation. The speech board is divided into two separate subsystems. The front-end subsystem manipulates the input into a form to be analyzed by a comparator subsystem where the voice templates are stored. 1. Front End The utterance, in the form of a frequency vs. time signal, enters thru a series of 16 bandpass filters. The outputs are rectified and then low-passed filtered over a period of time. The signal is then divided into 10 millisecond frames. Each frame ". . . is the average voltage a speech signal has in several frequency bands. The system computes speech frames at a rate of one hundred times a second." (Murveit, 19S6) During the process of computing the frames it checks for whether or not a word is really being spoken (referred to as endpoint detection). Assuming that a word is being spoken, the system varies the spectral sampling rate dynamically. The spectral difference of adjacent frames are then compared, and if the distance is insignificant then the frame is discarded. This technique is called selective downsampling and it 30 reduces the data rate through the system, particularly the long steady-state sounds in words. The result of disregarding the insignificant frames in this manner is improved accuracy, real time vocabulary* processing, and expanded template storage memory. The front end subsystem then downloads the frames into the comparator. 2. Comparator As the name implies this subsystem compares the incoming frame with those already in memory. This is accomplished by a technique called dynamic time warping. The input frames are compared with the reference frames of the words in memory. The sum of the differences of their spectral distances is computed. A score or cost for each and even.' word in memory is then computed and the minimum value is sought. The lower the score computed by the algorithm the better the recognition. As discussed in Chapter 3, if the score is below a rejection threshold then the string specified for the word is output. If the word score is above this value a non- recognition occurs. E. SUBJECTS One civilian and one military* officer participated in the testing of the SRI speech board. Both subjects were male 32 to 46 years old. The civilian (Ml) was very experienced with many types and models of speech recognition systems, while the military officer (M2) had less than 12 hours total exposure to speech systems. F. TRAINING ALGORITHM The training was conducted in a low noise speech lab at SRI utilizing a SHURE SM-10 close-talking microphone. A training algorithm was used to develop the templates for each speaker. This speaker dependent system requires the user using the the training algorithm to specify how many training passes are desired as well as the "cluster" size and method of input. This would allow one to input utterances from a tape recording and have the algorithm form templates on a fixed number of passes from the recording. The cluster size is an averaging technique which is the essential ingredient in creating templates. To form a cluster, an initial template (the first training pass usually) is compared against another utterance for that word or phrase. The spectral distance is calculated and compared to the initial utterance(s) in memory. If the minimum average distance is less than the distance specified in the algorithm, then one template is formed. Otherwise the system will indicate that a template could not be formed since the spectral clusters were outside the limits. The trainer program 31 then will prompt for more repetitions in an effort to generate a single template. If after three more repetitions a single template still could not be created from the additional utterances, two templates for the same word are computed. Each template and spoken word is placed alphabetically in a Unix directory. The templates are indicated by file type .// while the utterances are identified by a .uJ. For example if the word "advisory" is spoken twice in creating one template one would find the files advisory. tl, advisory. u 1 and advisory.ul. This is unique to this system and the advantages of this scheme will be evident later in this chapter. G. THE VOCABULARY Any vocabulary file can be created by specifying the word prompt followed by two colons, then the keystrokes or output string. This file is in the working director}' and is specified when invoking the trainer algorithm. In this particular experiment the subjects used a 100 word initial vocabulary taken from the 1000 word set used by SRI (Appendix A). A second vocabulary which was used in extensive studies conducted at the Naval Postgraduate School (Poock, 1981, 1986a) was sent directly to the SUN workstation at the host (SRI-BOZO) via the DDN. This vocabulary of 240 utterances is shown on the data sheet in Appendix B. It is divided into five groups of words based on the number of syllables. There were 10% one syllable words, 30% two syllable words, 20% three syllable words, 20% four syllable words, and 20% five or more syllable words. These words were selected from commands typically used in a command center. H. PROCEDURE AND DATA COLLECTED Several different testing periods were scheduled over a three month period. Both subjects traveled to the SRI International building in Palo Alto, Ca. to participate in the testing. The session started by logging onto the SRI-BOZO net via the Sun Microsystems Computer terminal. The appropriate windows were displayed and the MARA system was automatically enabled during the login sequence. The trainer program was used only once for each vocabulary. One user (MI) used three training passes while the other user (M2) only used two passes. There was no need throughout the three months to retrain the vocabularies. A selective retraining of several words was accomplished to demonstrate the ease of retraining or adding new words. 32 Under the main directory of XPS were the subdirectories of templates POOCK.TEMPLATES and MIKE.TEMPLATES. The word recognition program was enabled and the file of 100 words or 240 words was called. The program automatically searched the alphabetical subdirectories and loaded the proper templates on to the speech board. It took an average of 130 seconds to load the 240 word templates. For data collection purposes each session was recorded to a file with the lowest five words and their scores for each utterance. When possible- the other subject would record errors as he witnessed them to confirm the recorded data. Additionally, any abnormalities or peculiarities the system would display would be more apparent to the observer and thus free the subject to concentrate on the word list. In an effort to demonstrate the robustness of the system, the different lists were read with varying speeds. The vocabulary was tested forward, backward, and randomly at both a normal speaking rate and then at a significantly quicker pace. In addition. the subjects attempted to demonstrate the interoperability of the same voice patterns between the two subjects by using each others templates. A joint template was attempted but due to the relatively small spectral distance allowed in the training algorithm cluster averaging technique, after four passes no single joint template could be created. Several runs were conducted in a noisy environment. A cassette tape of machinery noise was played at a level of 74 db(A) at the microphone. This level is considerably higher than one could expect in a command and control environment even in a shipboard tactical decision center. The vocabulary can easily be modified by editing the file. If a file is modified to include a word not yet trained, the speech program indicates that it could not find a template for that word. Otherwise, it would load any template that was specified in the vocabulary regardless of whether or not it was trained at the same time or a part of another vocabulary. During one of the testing periods, the subjects used a syniactic feedback system demonstrated by SRI to NAVELEX in July 1934. (Murveit. 1986) The syntactic feedback system is a specially designed algorithm to correct recognition errors in a sentence. The grammar is structured as a finite state machine with beginning, end. and transition states. The program would compute the least-cost path through a scries of weighted arcs and then select the recognized sentence. For instance, in a data base query if a word or words were misrecognized by the recognition system, it could be corrected by the syntactic feedback algorithm. 33 Throughout the testing period it was evident that a good background in the UNIX operating system and familiarity of the MARA system were major prerequisites to effective use of the speech recognition system. Software improvements in user interaction and a well written operating manual for reference would have been helpful. I. RESULTS 1. Accuracy Results for the 1000 word vocabulary tests conducted by SRI reported in Murveit (19S6) are shown below in Table 1. Ml, M2, Fl, and F2 refer to individual male and female subjects.. The percentages refer to word recognition. TABLE 1 SRI 1000 WORD RECOGNITION PERFORMANCE Ml 89-91 % M2 89-93 % Fl 91-93 % F2 86-90 % The data shown in Table 2 reproduced from Murveit (1986), reflect the results of SRI's speech recognition system utilizing the TI-20-word data base used to test commercial speech recognition systems. (Doddington and Schalk. 1981) The results of the tests conducted by our subjects appear in Tables 3 through 6. These tables represent the trials with the variability in speech speed and no maximum rejection threshold specified. A two sample T test utilizing an Arcsin Transformation criteria was completed using MI SI-TAB statistics package showing no significance between the two means of our subjects at the 0.05 level of significance. (Minitab, 1981) 2. Interoperability of Voice Patterns for Different Users The results of the interoperability tests are shown in Table 5 showing an obvious decrease in accuracy. The computed scores or differences between the 34 TABLE 2 SRI TI DATA BASE PERFORMANCE (ERRORS OF 320) 16 SPEAKERS TOTAL 13 ERRORS .25% MEAN ERROR RATE 320 UTTERANCES EACH TABLE 3 NTS 100 WORD VOCABULARY TEST Ml 94-98 % 8 TRIALS AVG 96 % M2 91-99 % 12 TRIALS AVG 97 % TABLE 4 NTS 240 WORD VOCABULARY TEST Ml 95-100% 7 TRIALS AVG 97 % M2 9S-100 % 7 TRIALS AVG 99 % recognized words and the templates were on the average 10 points higher than the mean of their scores with their own templates. 33 TABLE 5 INTEROPERABILITY TESTS Ml using M2 Templates 80-89 % 3 TRIALS M2 using M2 Templates 78-86 % 3 Trials 3. Accuracy in a Noisy Environment The endpoint deieciion process which is computed in the front end section of the card also keeps track of the background noise level and effectively ". . . eliminates moderate room noises and maintains proper signal levels in the converter and analysis circuits." (Murveit, 19S6) The background noise elimination features oi" the microphone and the system allowed it to perform with virtually no degradation in recognition performance. It is interesting to note that the system was not capable of any recognition at approximately 76db(A). Table 6 shows the results in a noisy environment. Ml M2 TABLE 6 NOISY ENVIRONMENT 99 % 96-98 % 2 TRIALS 2 TRIALS J. SYNTACTIC FEEDBACK The subjects during oi\g testing session exercised the syntactic feedback system using a limited vocabulary and allowable sentence structure. There are a number of questions which are suggested by Murveit (1986). These issues should be pursued, since there is an increase in accuracy realized in using this algorithm. 36 K. CONCLUSIONS AND RECOMMENDATIONS The purpose of these tests was to examine the voice recognition performance of the SRI 'Berkeley' 1000-word discrete speech recognition board. The results of our testing confirms the results reported by SRI Project 6096. (Murveit, 1986) Their 1000-word speech recognition system is very accurate and quite fast. Throughout the entire study, no degradation of the templates occured. The experiment was conducted entirely on initial templates. Despite the variability in speaking rate, three months of broken testing, and testing in a noisy environment, the system performed proficiently. However, the SRI 'Berkeley' board in its present configuration does not meet all the requirements necessary to be a viable interface in the CCWS. In spite of commercial discrete speech recognition system vendors advertising an input rate of 60 words minute, discrete speech recognition systems are not suitable for a Command and Control environment. The user must modify his speaking rate by pausing alter each utterance to effectively use the system. It would be insensitive to the ultimate users in a CCWS environment to assume that discrete utterances in a high tempo, high pressure, and possibly high threat situation is even remotely acceptable. A connected or even a continuous speech recognition system is the only suitable alternative. This gives the Commander the best opportunity to process information quickly and accuratelv allowing him more time to enact a timelv and knowledgeable decision. 37 V. TEST, ANALYZE, AND EVALUATE A COMMERCIAL CONNECTED VOICE RECOGNITION SYSTEM IN A WARGAMING ENVIRONMENT The previous chapter analyzed the reliability of a 1000 word discrete speech recognition system. The SRI speech board is a state-of-the-art system which was quite good and very accurate. The disadvantage was, of course, utilizing a discrete system in a command and control environment. The purpose of this chapter is to analyze the performance oC a relatively inexpensive, commercially available continuous speech system. The VOTAN 6050 Model II product was examined for its applicability and adaptability to a command and control environment in a particular Naval Warfare Interactive Simulation System (NYV15S). VOTAN has been used in many experiments, tests, and applications and is regarded by many as a very capable speech recognizer. For example, in the Navy's air traffic control trainer and simulator this same recognizer was demonstrated and performed quite well. The VOTAN was used in this experiment to focus on four major areas: (1) An application of a continuous speech svstem in a Command and Control environment similar to a workstation module. (2) Investigate anv significant differences in the ability to input commands by speech or keyboard entry. (3) Investigate the possibility of utilizing a speech recognition svstem in Naw Tactical trainers to overcome the dead time in learning the game command keystrokes and entry procedures. (4) Investigate anv significant differences in speed of command entrv for users with familiarity with standard Navy phraseology versus those unfamiliar with using speech recognition systems. There is considerable time expended at every tactical trainer by the users in familiarizing themselves with the equipment and game command entry procedures. This 'dead' time could be eliminated by using a standardized vocabulary as used in Navy contact reporting procedures and incorporating speech recognition to minimize keyboard operation and special game commands. The result would be an increase in useful tactical trainer time. Before examining the VOTAN speech system we will briefly describe NWISS and the similarities to the proposed specifications for the Command and Control Workstation (CCWS). 38 A. DESCRIPTION OF THE NAVAL WARFARE INTERACTIVE SIMULATION SYSTEM (NWISS) NWISS is a real-time, user-interactive simulation of naval warfare. Its mission was originally to train senior Naval Officers in force-level tactical decision making and management of command and control. The NWISS game resides on a VAX 11,780 computer, and a network of peripheral VT100/102, ADM31 terminals and RAMTEK graphics terminals to provide the necessary displays and interactive stations. The equipment is located in the Naval Postgraduate School Wargaming Analysis and Research (WAR) Laboratory. There is a sufficient amount of equipment to support three separate bays or areas to simulate disjoint command and control modules. The equipment available in the wargaming and research laboratory is very similar to the equipment for the CCWS. The Distributed Command System (previously shown in Figure 1.1). shows the Interim Battle Group Tactical Trainer (IBGTT). which is a component to be interfaced into the local area network. NWISS is to be integrated into the IBGTT network in 1987. In applying a continuous speech capability on the NWISS. we can analyze the requirements for a continuous speech system in a C environment. The RAMTEK monitor is the display system used in the NWISS modules. The presentation is nothing more than a typical Naval Tactical Data System (NTDS) picture with some exceptions and is similar to the display envisioned for the CCWS. All ships, planes, and submarines are displayed utilizing standard Navy symbology as shown in Figure 5.1, with some differences. The exceptions to standard shipboard NTDS console display are summarized below: NWISS has color enhanced symbology (An excellent screen improvement). The track symbology in NWISS does not reflect engagement status of tracks. Track information is available onlv on display boards and is not accessible from the graphic display screen. Electronic (ESM) and acoustic (SONAR) emissions lines of bearing are color coded as well. Old tracks change to yellow to indicate a fading track. NWISS does not have representative svmboloev available in NTDS to indicate type of platform. NTDS has balltab capability for immediatelv obtaining information on the status of tracks. The color scheme displays all known friendly forces in blue, enemy forces in red, and unknown contacts in white, with a fading tracks indicated in yellow. 39 COMMON (NTDS & NWISS) AIR SURFACE SUBSURFACE /\ \y HOSTILE | | UNKNOWN SPECIAL POINTS CAP STA. DATUM ACOUSTIC FIX r^ o EWFIX \^y FRIENDLY NTDS HOSTILE RAID MANY >/\ ASSIGNED SAM/GUNS FEW >^ ASSIGNED CAP ASSIGNED fj^ ASSIGNED ENGAGED fT\ AMPLIFYING I.D ^G) © m rh rr\ CAP CV CG/CGN T X FIXED ASW AIR WING HELO TANKER ASW Figure 5.1 NTDS Symbology. B. SCENARIO The scenario for the NWISS game was designed to place subjects in situations requiring the input of many combinations of the various commands available. It was the first exposure for most of the subjects to a multi-threat Naval wargame since it was the introductory simulation course for students of the Naval Postgraduate School Command and Control curriculum. Each group of students embarked in separate aircraft carriers or command and control modules. 1 lie objective was designed to demonstrate: • High Resolution Color Graphics • Friendly man - machine interface • The level of detail required to plan, run, summarize, and analyze a relatively low level waruame 40 • The N'PS WAR Lab capabilities Additionally, the purpose of each of the runs was to familiarize the subjects with the game and experiment with the various commands and display boards. The actual situation briefing used in these tests is included in Appendix C. C. VOTAN SPEECH RECOGNITION SYSTEM MODEL 6050 SERIES II The VOTAN VTR 6050 Series II is a stand alone unit which can interface with any system supporting a standard RS-232 port. It has the ability to operate in two distinct modes: Voice Terminal (VTR) and Voice Peripheral (VP). The VTR mode allows the equipment to interface directly between a terminal and a host. This is the mode that was used in the NWISS game with an ADM 3 1 terminal and the VAX 11 780 as host. The configuration to run NWISS with the VOTAN appears in Figure 5.2. The VP mode is designed for telephone-based applications. This mode was not used in this experiment and will not be discussed. CONFIGURATION TO RUN NWISS WITH VOTAN VAX 11/780 ADM31 J / * V< 6 DTAr* 050 II J > x T ERMINAL HOST Figure 5.2 Configuration To Run NWISS With The VOTAN. 1. Vocabulary Size The VOTAN 6050 Series II has three internal components which support its vocabulary. These are: • VTR System Memory (approximately 500K) • Floppy Disk Memory (maximum of 76()K) • Voice Card Memory (maximum o( 22K) 41 In addition to these components there is also the possibility of storing voice files on the host computer. This was not used since the vocabulary was small enough to be stored directly in system memory. The average word or template uses 200-250 bytes of memory. When the system is fully loaded, there can be 2000-3000 words in main memory. It is important to note that all voice recognition takes place on the voice card. The voice card can accommodate up to 50 words (from the 2000-3000 in main memory') at a time. A tradeoff can be seen in the number of words vs. the number of templates for each word. The more accuracy required, the more templates needed for each word, and the fewer words loaded into each active set. The main memory can contain multiple sets and takes only about 150 msec to upload sets onto the voice card memory7. This can be done by tailoring the vocabulary to switch automatically upon hearing a switch word or can be automatically switched when a certain number of word(s) are recognized from an on-line set. A switch is a mnemonic that is spoken by the user to load the voice card with a specific set of templates. This file is transferred at a rate of 9600 baud. During the upload period the VTR is automatically recording speech (up to 7 sees) to be searched immediately upon completion of the swap. It is extremely fast and is virtually unnoticed by the user. It is recommended in VOTAN Guide To Procedures, that one should limit the number of words in a set to about 10 to 20. A set of this size will optimize recognition and provide a quicker system response time. 2. Programming The VTR 6050 Series II can be easily programmed. The key element in optimizing the performance of the system is careful construction of the vocabulary so as not to exceed the voice card memory limitations and to minimize set changes. With the VTR in the off-line mode, (which blocks any keystrokes from going to the host), a vocabulary is entered directly onto the screen in an editor mode. The user specifies the file name and then begins entering headings for the word sets followed by the actual words in the set. The following is an example of a small file which is included to show the various programming commands available: (VOTAN GTP and UG, 1985) EDT NUMBERS -(this allows you to enter the EDITOR) "(mode) S-NUMBERS, *(this specifies the set name NUMBERS) NS = COLORS, -(this is the pointer to the NEXT SET:) *(COLORS which is) 42 CT= 2, *( automatically loaded after 2) precognitions of this set) CM ^indicates NUMBERS is a COMMON) *(word always in memory) ONE,HS = 1 "(ONE is the prompt and 1 is the) *( string sent to the host) TWO,TS=2 *(TWO is the prompt and 2 is the) *( string to the terminal) THREE.TS = 3\20 *(the \20 is the hexadecimal string) *(for space to be) *(sent to the terminal after the 3) FOUR,HS = 4 -(FOUR is the prompt and 4 is the) "(string to the host) Appendix D is the listing of the vocabulary used for the NWISS game and will be discussed later in the chapter. 3. Operation While the VOTAN 6050 Series II is still in the off-line mode, the user's vocabulary and templates are placed into memory. In addition to the set in memory there are certain words called TASK WORDS which control operation of the VTR when it is on-line, and a collection of words in the user tailored vocabulary which can be indicated as COMMON words that are also a part of the total allowed templates on the voice card memory. The user can specify an initial word set that will be activated each time the system is initialized. Additionally, the user can specify whether or not data buffering should be used. Data buffering allows the system to store a predetermined number of strings or characters before outputting them to the host. Data buffering can be extremely beneficial when a user needs to verify a string of words prior to being sent. Numerous military situations require validation of codes or strings to ensure proper actions upon receipt. The default condition is immediate action when the word or phrase is recognized. These are some one time preliminary set-up inputs. Once this is accomplished the system is ready to be put in the on-line (ONL) mode. This sends the host string directly to the computer upon recognition. These keystrokes are then returned by the host and displayed on the screen. The keyboard can still be used and the VOTAN is transparent to the user when passing these kevstrokes directlv to the host. 43 4. Training Algorithms The VOTAN 6050 Series II offers two types of training algorithms: single/discrete training and continuous training. In the single training mode, one template is formed after each utterance. The continuous training method extracts templates from a series of passes for each word in the set. This takes into account the coarticulation of a word at the beginning, middle, and the end of a group of words. Prior to entering the continuous training mode, the user must have at least two single trained templates available for template extraction to occur. The user specifies the set which he would like for continuous training. The algorithm then automatically selects up to ten words at a time and presents to the user a series of five of these words in random order on the screen. The user repeats all five words in a continuous manner. It will then display two columns of words if a sufficient number of words were recognized. The first column lists the words that were displayed as the prompts. '1 he second column contains the words that the system recognized. Several misrecognitions may be observed; however, the algorithm uses the other correctly recognized words for forming the extracted templates. This ability to develop these extracted templates enables VOTAN to make the claim of having a continuous recognizer. The operator can manipulate the presentation during continuous training to ascertain the progress of completion of a recognition matrix for the current set of words being trained. The matrix has three columns for each word indicating where the word occured in a string of words (i.e.. beginning, middle, or end). There are some training passes where there will be an insufficient number of words recognized and the system will prompt the user to continue training a new set. After a certain number of passes or when the matrix is completely filled, the program will terminate the training of that word group and continue with the next set often words. Prior to operating the system in VTR mode which transmits the output strings to the host or terminal, the user can invoke a program to test his templates and to ensure voice card storage has not been exceeded. The output display upon recognition consists of the recognized prompt characters and the recognition score. The recognition score is computed from the spectral distances between the template and the spoker word. Like the SRI system the lowest score is the best recognized word. The recognizer has a minimum recognition threshold default of 50, but the user can modify this value if desired. This level appears to be quite adequate for most applications. 44 D. SUBJECTS Six male officers participated in this experiment. Five were Naval Officers from various communities. Three had previous experience with the modeled systems and were familiar with the terminology of giving similar orders. These were the individuals used in validating the area of familiarity with battle group phraseology vs. having no experience. All but one of the officers had less than 12 hours total exposure to voice recognition systems. The other officer had about 100 hours experience with various voice systems. E. THE VOCABULARY The vocabulary for the NWISS wargame consists of two major groups of commands: DISPLAY and ACTIVE. The DISPLAY commands control all aspects of the graphic plot as displayed on the RAM TEK monitor. The 'active' commands consist of many different orders that could be given to ships, submarines, and aircraft. There are actually a total of 230 allowed words that are recognized by the NWISS game. The NWISS game requires that the commands be ordered in a particular way. For example, after activate, the game would expect to see 7 different commands, and would disallow other inputs. These same words could appear in different positions in different correct commands to the host (this plurality in commands occurs throughout the vocabulary). In addition, the number of options after identifying a force name can range up to 50-60 possible commands, greatly exceeding the limitation of the voice card. This peculiarity required a more general tailoring of the vocabulary7 to model the NWISS word structure, since one could not tailor the vocabulary into finite sets allowing only a small number of words to follow other words. It is a similar problem experienced by SRI in formulating the valid structures used in formulating the finite states used in the syntactic feedback system. Consequently, this made it impossible to formulate the vocabulary- within the memory and template limitations without multiple switch words. Appendix D is the listing of the vocabulary used in this experiment. Note that there are six major vocabularies or sets: Display, Ships, Commands to Units. Numbers, Aviation, and Load. This was done to minimize the number of switches necessary for full use of the commands. For example, an actual voice command for activating an air search radar utilizing the VOTAN would be: SHIPS SPRUANCE ACTIVATE AIR NUMBERS 1245 ENTER. (6 sees) 45 The bold words are the switch words for the two sets. The same command by keyboard entry is: FOR SPRUA ACTIVATE AIR 1245 (28 keystrokes) (~ 10 sec) (NWISS, 19S3) F. PROCEDURE The training was conducted in the C WAR Lab at the same input terminal to be used for the game. A SHU RE SM-10 close talking microphone was used for the training and game play. The subjects used in the experiment were trained in individual sessions on the VOTAN speech recognizer. The training took place in one session which averaged approximately 75 minutes. The enrollment started by loading copies of the commands as shown in Appendix D in active memory without any templates. An overview of how the training was to be conducted was given including proper microphone placement and description of the vocabularies. Each subject started by generating two single trained templates for the set of NUMBERS, (this set included all numbers 0-9 and letters A-Z). The set NUMBERS was anticipated to require continuous training because of the extensive use of alpha- numerics in commands. Following the individual training of this set, the continuous training algorithm was invoked. Displaying the continuous training matrix during training led to the discovery that the algorithm is not sophisticated enough to determine exactly what order it should present the group if there are only a few unfilled blocks left in the matrix. This can be time consuming especially if the processor is experiencing some difficulties in developing an extraction template for a particular word. Upon completion of continuous training there were now five templates for each word in the set. It became apparent that this number would far exceed the number allowed on the speech card and therefore all single templates were erased. The remaining words were presented for two sets of single1 discrete training passes. After all word sets were trained, each set was displayed with the total number of templates and memory used. Task words' and 'common' words reside on the voice card at all times. In all cases, three of the six possible sets had exceeded usable memory, as shown in Table 7. A review of the vocabulary and sets showed that 28 words were duplicated intentionally in the composition of the sets. This design redundancy was to reduce the 46 TABLE 7 INITIAL TEMPLATE LISTING VOCABULARY SET MEMORY (BYTES) AVERAGE TASK_WORDS 1651 COMMON 2916 NUMBERS* 1S740 COMMANDS JTOJJNITS 23675 AVIATION 25382 SHIPS S900 LOAD 16229 AVERAGE NUMBER OF TEMPLATES 8 IS 1 jj 148 142 50 86 SINGLE TRAINED TEMPLATES NOT INCLUDED EXCEEDS VOICE CARD LIMITATIONS (COMMON AND TASK_WORDS INCLUDED) number of switches needed for the formulation of proper commands. Consequently, there were actually four separately trained templates for these words in storage. Two of these templates for these words were deleted from the active sets. In every case, an average of 45 additional templates were deleted to bring the memory and number of templates allowed within limits. The words that were reduced to only one template were those words with many syllables and that were readily recognized. The actual number removed varied according to the user and the way each word was enunciated. That is. if utterances were fairly slow, more memory was required. Table 8 depicts the average final number of templates and memory remaining in the actual individual files for all users. The final test was to invoke the trainer program and ensure there were no memory overflow or template overflow errors produced as the different sets were loaded onto the voice card. It is recognized that having to delete templates causes a corresponding decrease in recognition and is a significant limitation imposed by the svstem. 47 TABLE 8 REVISED TEMPLATE LISTING VOCABULARY SET MEMORY (BYTES) AVERAGE TASK_WORDS 1651 COMMON 2916 NUMBERS 13761 COMMANDS_TO_UNTTS 16953 AVIATION 17304 SHIPS 8900 LOAD 15S54 AVERAGE NUMBER OF TEMPLATES 18 100 104 95 50 85 Each subject had no further training. At the start of the -game the subject's revised templates were loaded into the recognizer. They were allowed to perform their roles by inputting commands as necessary. The short time available to conduct the tests precluded evaluating the interoperability of data sets (i.e., one user operating from another's voice templates). Although the system was not designed to accomplish this, it is a point of interest when evaluating systems in a command and control environment. The purpose is that in the event of a mishap to the active operator a slow transition to another operator would 2 have a negative impact on the C center operation. The time to exchange vocabularies from one user to another was 62 seconds. The level of noise in the module was not measured, but during the conduct of the exercise the noise in the groups during discussions and administration was very similar to those encountered in a real command and control center. The VOTAN gain can be easily adjusted if necessary. Additionally, the 240 word vocabulary (Appendix B) was loaded into the VOTAN. A comparison of speech recognition accuracy of the VOATAN vs. SRI is shown in Table 9 using subject M2 from the previous tests. The 240 word vocabulary 48 was loaded into 5 sets and with an average number of 96 templates and 19575 bytes of memory per set to simulate the conditions present for the NWISS vocabulary'. It is evident from the data that exceeding the manufacturers recommendations of loading does in fact effect performance. TABLE 9 SRI VS VOTAN 240 WORD RECOGNITION ACCURACY TEST M2 SRI 99 % VOTAN 97.4 % G. RESULTS The experiment set out to focus on four separate areas: (1) Demonstrate an application of a continuous speech system. (2) Investigate any significant differences in the ability to input commands by speech "or keyboard entry. (3) Investigate the possibility of utilizing a speech recognition system in Navy Tactical trainers to overcome the dead time in learning the game command keystrokes and entry procedures. (4) Investigate anv significant differences in speed of command entry of users familiar with standard Navy phraseology versus those unfamiliar with using speech recognition systems. ' The results from the three separate runs and data collected with the constraints described show that the VOTAN in its present configuration was unable to adapt to this C*" environment. This is primarily due to the limitations ol storage and processing power of the voice card. The NWISS vocabulary is not suited for designing a distinct branching method of words from one set to other sets for correct formulation of commands. This inability to establish a tree architecture for correct command structure, resulted in the number of words in most sets exceeding the recommended number by 3.5 times. As discussed in the technical documentation and discussed earlier, the optimum number of 10-20 words would increase recognition and provide a quicker response time. With an average number of 55 words, the reaction time was inordinately slow and misrecognitions were higher than expected. Speed of speech input as stated by Kavaler (1986). is a function of: • Speech rate 49 • The processing power of the speech recognizer • ' The constraints placed on the way the user must speak (i.e., discrete vs. connected, number of 'switch words'J. Subjects entering commands by voice with these constraints were confused and frustrated since the time delay for the recognition to appear on the terminal was often slower than one would expect for keyboard entry. Likewise, if a misrecognition occured at some point in the string a user would have to attempt to back out the command or cancel it and start the entire entry over again. The design of NWISS command entry procedures has some unique human engineering advantages for keyboard entry. The host would not allow a command to be entered if it did not form a correct entry. The terminal would beep and inhibit any incorrect keystrokes. The user could type a question mark '?' and the list of acceptable entries would be listed. Even though this occurred in the voice entry procedure as well the user would be disappointed by the misrecognition and often forget the voice command 'help' which would output '?'. Eventually, he felt more hostility and mistrust toward the recognizer and got flustered, forgetting which set he was in and eventually cancelling the entire command again. The frustration from a misrecognition was also attributable to the unfamiliarity with words in the sets and the proper NWISS command structure. The user usually blamed his uncertainty in the set and command structure on himself adding to more disappointment and disillusionment with the recognizer. In later trials, a combination of voice and keyboard was used by some subjects. They used voice for certain words and commands they felt comfortable with and then used the keyboard for the unfamiliar commands Or for entries they felt required immediate and correct entry. There could not be any determination o( advantages in utilizing a speech recognition system in Navy tactical trainers to overcome the dead time in learning the game command keystrokes and entry procedures. The human engineering in the design of this particular wargaming system was extremely helpful both in providing assistance and prompts, as well as accepting as few as four keystrokes for certain commands. Further study is required in this area. The subjects with some familiarity with wargaming had a distinct advantage over those who did not, both with and without voice entry. This advantage could not be directly attributed to the voice recognition application but was quite evident in the level of play. They were more comfortable at the input terminal and were relied upon by the other members in the group for advice to interpret the displays. 50 H. CONCLUSIONS AND RECOMMENDATIONS Even though initially the VOTAN seemed very promising and an excellent candidate for a C environment, this speech recognizer is not well suited for CCWS. It failed because the vocabulary limits of the voice card and the processing power of this recognizer were exceeded by the demands of the NWISS vocabulary. Consequently, the recognition and output speed were jeopardized. The large 1000 word vocabulary and real-time processing is necessary in the CCWS application for data base queries. Additionally, the user is required to memorize which set is active and the 'switch' words needed to enter the various sets. The user using the VOTAN must adapt his speech to the recognizer which is unacceptable. The recognizer must be an extension of the commander not a hinderance. The combination voice and keyboard entry employed by some o[ our subjects during the end of the testing indicates a possible area for future study. The application of speech entry in conjunction with keyboard, mouse, or balltab manipulation should be investigated. The balltab is the exclusive device for an NTDS console in a shipboard command and control center. This would allow a smaller, more tailored vocabulary integrated into existing systems to aid the user, particularly if that individual must be positioned at a console or terminal. 51 VI. CONCLUSIONS It is intuitive that the commander who can manage and process the tremendous flow of battle information the fastest will have more time to determine a response or make decisions which are always ahead of his adversary. As the dependency of the commander on computing resources increases, it is only natural to expect greater demands upon the man-machine interface. By including a speech recognition system on the CCWS, the commander would realize a faster information processing rate. This would result in the commander acquiring more knowledge in a faster time on which to base his decision. As Sun Tzu, the famous Chou Dynasty philosopher and military strategist once stated ". . . knowledge is power and permits the wise to conquer without bloodshed and to accomplish deeds surpassing all others." This thesis evaluated the performance of a state-of-the-art 1000 word discrete template matching system and a commercially available VOTAN continuous speech recognition system. The requirements specified for the CCWS were: Large vocabulary (capacity > 1000 words). Real-time response. Very high recognition accuracy ( > 98%). Adaptable to the user, (i.e., the user should not have to modify or alter his speaking rate significantly) No deterioration in accuracy in noisy and stressful environments. The systems evaluated in this thesis did not fulfill all the requirements for the speech application in the CCWS. Each system had its advantages and disadvantages which were discussed in the conclusion of each respective chapter. Currently, there is not a system commercially available capable of meeting all these requirements. Even though neither system met all the requirements established for the CCWS, recent literature reflects the improvements in the Strategic Computing Program, in particular, phonetic recognition. Speech systems capable of meeting and exceeding these specifications are not far away. In fact, CINCPACFLT is scheduled to test and evaluate the speech recognition system being developed by the Strategic Computing Program. (Strategic Computing, 1985) As computers become more and more capable of displaying, storing, and processing information, it is only natural to assume that the interface between the user 52 and computer should be. optimized. We all can recount from our own experiences, ". . . the costs of poorly designed interfaces. Coming in many forms, the cost can include degraded user productivity, user frustration, increased training costs, and the need to redesign . . ." (Foley, 1984). For these very reasons, the design of even- interface for an interactive user-computer must be of utmost importance. Speech recognition has long been thought of as the ideal interface and must be considered for all future systems. 53 APPENDIX A SRI 100 WORD VOCABULARY a dinner manner rose able direction many round aboard discovered March run about distance mark running above do market said accept for Mary steps according foce material still account forced matter stock across foreign may stone act forget maybe stop both form our stopped bottom forty out store box forward outside story break found over straight bring for own street broken I'd page U.S. brought I'll paid under Brown I'm paper understand building I've Paris union built idea part university development ideas right-paren unless did if river until didn't immediately road up different important Robert difficult impossible room 54 APPENDIX B 240 WORD VOCABULARY one yankee Gary_Poock carriage_return Iran S wee den login_Poock accat_title load_gld3 Poock_NPS_password three logout red_sphere zero November use_that_one Captain_Ebbert up_in_detail level_two_viewer genisco_zero_parameters five alpha charlie echo juliett move_it_left San_Francisco engineering voice_technology Russian_version_of_Hormuz eieht two air_routes load_the_gun load_the_server Japan Europe level_two strait_of_Hormuz connect_to_charlie change_directory_to_hunter four graphics steam_plant seven move_it_down spirograph close_out_charlie United_States North_Atlantic_.\lap M editerranean_Chart six bravo delta foxtrot romeo sierra application human_factors central_expressway filc_transfer_protocol nine 55 hotel kilo oscar move_it_right Vietnam advisory business_meeting speech_recognition efficient_transmission golf quebec victor xray move_it_up Tokyo down_in_detail criteria suitability identification course command bingo proceed altitude relocate available track esm command_and_control enemy_detection launch cancel bearing orders satellite negative india lima pappa uniform Korea interactive continuous continuous_speech system_integration mike tango whiskey zulu Bangladesh Hollister corporation advantages radiology automatic_recognition speed attack report station recover designate plot_esm designate_track probability probability_of_detection fire message label copy envelope correlate 56 combination maneuver_delay Task_Force_Commander proceed_to_New_Delhi time surface minefield shore_based execute enemy Connecticut Oklahoma California place_a_marker_on_Paris bingo_all_craft_immediately neutral sensor Stockton air_field_name track_friendly bearing_and_distance Minnesota Eisenhower relocate_the_Sunfish take Georgia Texas Utah latitude Ohio flight_controller Pango_Pango lay_a_barrier attack_barrier_target scope sensor_delay Alabama North_Carolina place_a_circle_on_M oscow shoot refuel distance contact submarine order_name Indiana Pennsylvania South_Dakota map grid missile Adak New_York track_unknown track_neutral Louisiana Colorado Ne\v_Mexico refuel_the_Connie place Vermont Daniels platform longitude torpedo Trans_World_Airlines keep_on_station ground_control_approach Atlantic_Data_Base drop 57 Bangkok Brisbane Antwerp Arkansas user's_guide Acapulco Yokohama Diego_Garcia Pacific_Data_Base Maine Portland Aspro red_fox blue_force_one Baltimore Sevastopol chronometer plot_all_submarines Iberian Carrier Bombay Canton Africa Saigon Kitty_Hawk Vladivostok Sea_of_Japan Indonesia Arabian_Tanker save Rangoon Kiev- Naples Calcutta Wyoming Honolulu John_Kennedy United_Air_Lines West_German_Torpedo 58 APPENDIX C SCENARIO BRIEFING FROM: COMSEVENTH FLEET TO: COMMANDER. TASK GROUP ONE PT ONE COMMANDER, TASK GROUP ONE PT TWO OPORDER 00003 1 THIS MESSAGE CONSTITUTES AN OPERATION ORDER FOR CTG ONE PT ONE AND ONE PT TWO. IT CONSISTS OF GEOPOLITICAL BACKGROUND. COMPELLING EXECUTION OF OPERATION. TASK FORCE ORGANIZATION. OPERATION OBJECTIVES. SUMMARY OF OPPOSING FORCES. AND DIRECTION CONCERNING CONDUCT OF OPERATION. 2 DURING THE LAST 48 HOURS THE CVBGS HAVE DRAWN NEAR TO EACH OTHER AND NOW MAY BE ORGANIZED INTO A TASK FORCE OF CONSIDERABLE SIZE. AS LIGHT DAWNS THE JFK HAS RECOVERED THE LATE NIGHT LAUNCH WHICH WAS CYCLIC DUE TO THAT CARRIERS CLOSER PASSAGE TO ENEMY LAND BASES AND DUE TO THAT THE JAPANESE ISLANDS THAT COULD NOT BE ASSUMED TO BE FRIENDLY. THE AIR COMPLEMENT FIAS BEEN AT WORK FOR AT LEAST 48 HOURS. KITTY ON THE OTHER HAND HAS JUST LAUNCHED A CAP GRID WHICH IS PROCEEDING TO POSITION. IT INCLUDES AN E2 AND AN S3. A. AN E3A (AWACS) WAS SUPPOSED TO ARRIVE ON STATION OUT OF ADAK ON AN AIR FORCE MISSION ABOUT ONE HOUR AGO. HOWEVER SHE HAD NO REPORTING RESPONSIBILITY TO THE OTC AND HER PRESENCE HAS NOT AS YET BEEN CONFIRMED. P3S ARE DEPLOYED IN SUPPORT HOWEVER. . TASK GROUP ONE PT ONE CONSISTS OF THE FOLLOWING SHIPS LOCATED 12 HOURS PRIOR TO THE START OF YOUR RUN FOR RECORD AS FOLS: 59 USS KITTYHAWK 46-30N/157E (APPROX) USS WICHITA USS KNOX USS SPRUANCE USS RATHBURNE USS WILSON USS MCCORMICK USS FOX USS LOS ANGELES USS OMAHA SOJ PATRON FOUR SIX IN PLACE MISAWA AB. 40-00N 141-50E. PATRON SEVENTEEN IN PLACE, ADAK AB, 51-50N 176-30W. UNSUBORDINATED AWACS DET IN PLACE, ADAK. CVBG 1.2, JFK TASK GROUP CONSISTS OF THE FOLLOWING UNITS: USS JOHN F. KENNEDY 46-30N 155E (APPROX) USS IOWA USS LONG BEACH USS JOHN ROGERS USS TURNER JOY USS JOHN HANCOCK USS MAC USS FURER USS GAR (NEW CONSTRUCTION SSN) SOJ 4. OPERATIONAL OBJECTIVES: (A REPEAT) THE SEA OF OKHOTSK AND THE BASES WHICH SURROUND IT PROVIDE A PRIMARY SANCTUARY FOR THE SOVIET FAR EASTERN FLEET. PROCEED TO A POSITION FORM WHICH YOUR COMBINED FORCES CAN INTERDICT SURFACE AND SUBSURFACE FORCES AND LAUNCH STRIKES AGAINST THE SOVIET LAND BASED AIR STRONGHOLDS. PREPARE TO FIGHT YOUR WAY IN AND STAY AS LONG AS POSSIBLE. 60 • PRIMARY MISSION ONE PLAN FOR AND BE PREPARED TO CONDUCT A PREEMPTIVE AIR RAID ON PETRO WHEN IN POSITION AND WHEN DIRECTED BY HIGHER AUTHORITY. • PRIMARY MISSION TWO SEARCH FOR. IDENTIFY AND REPORT, THE SOVIET MINSK BG, AND ANY RED SUBMARINES WHICH MAY BE ENCOUNTERED. BE PREPARED TO CONDUCT SHORT NOTICE PREEMPTIVE ATTACK ON THESE FORCES WHEN DIRECTED BY HIGHER AUTHORITY. 5. SUMMARY OF OPPOSING FORCES: ANTICIPATED OPPOSING FORCES CONSIST OF THE SOVIET TASK GROUP COMPRISED OF: ONE MINSK CLASS CGH ONE KASHIN CLASS CGL ONE KREST II CLASS CG TWO VICTOR CLASS SSN TWO CHARLIE CLASS SSGN ONE ECH02 CLASS SSGN INTELLIGENCE SOURCES INDICATE POSSIBILITY THAT ADDITIONAL SURFACE UNITS OF UNKNOWN TYPE MAY HAVE DEPARTED VLADIVOSTOK WITHIN THE PAST 36 HOURS. ALTHOUGH THIS IS , AS YET, UNCONFIRMED. 24 HOURS PRIOR TO THE START OF YOUR RUN FOR RECORD, THE SURFACE FORCES WERE IN THE SEA OF OKHOTSK. IT IS ANTICIPATED THAT ONE SUB WILL CONTINUE WITH THE SOVIET BG DURING THE LAST 36 HOURS ONE HOSTILE SSN HAS BEEN DETECTED IN THE VICINITY OF KITTY. EVASIVE ACTION AND BEST SPEED MAY HAVE LEFT IT BEHIND FOR THE TIME BEING, HOWEVER. SPEED OF TASK GROUP ADVANCE HAS BEEN SLOWED AND VIGILANCE TO THE REAR IS ADVISED. THE CONTACT THOUGHT TO BE SHADOWING THE JFK WAS NEVER CONFIRMED BY CVBG FORCES OR THE FURER ON HER TRIP 61 NORTH. THE REMAINING SUBS ARE EXPECTED TO BE IN POSITION TO OPPOSE YOUR TRANSIT NEAR THE ISLAND PASSAGES NOTHEAST OF HOKKAIDO. INTEL STILL ESTIMATES THE GREATEST THREAT WILL BE FROM (1) LAND BASED AIR OF REGIMENTAL SIZE GROUPINGS. AND (2) FROM SSNs THAT ARE CURRENTLY DEPLOYED OR WILL DEPLOY SHORLY. THE SOVIET TASK GROUP CAN BE EXPECTED TO OPPOSE ENTRY TO THE SEA OF O TO SOME DEGREE. 6. DIRECTION CONCERNING THE CONDUCT OF THE OPERATION: THE CONDUCT OF THE OPERATION IS AT THE DISCRETION OF THE OFFICER IN TACTICAL COMMAND WITHIN THE FOLLOWING CONSTRAINTS AND POLICY GUIDANCE: 1 DEFCON CONDITION TWO. WE ARE NOT AT WAR. IF POSSIBLE. AVOID ACTIONS WHICH COULD PROVOKE A WAR. CONFIRM AS EARLY AS POSSIBLE WHICH COMMANDER CVBG 1.1 OR CVBG 1.2. WILL BE OTC. KITTY IS STILL THE ONLY SHIP WITH KEYING MATERIAL NECESSARY TO GAIN LAND BASED AIR SUPPORT FROM ADAK (THIRD FLEET) AND MISAWA (SEVENTH FLEET). EXPECT LATE BREAKING GUIDANCE FROM THIS HEADQUARTERS AS EVENTS IN EUROPE COULD SIGNAL THE START OF ACTIONS IN THIS THEATRE. 2 WEAPONS ARE TIGHT AT THIS TIME. WEAPONS FREE STATUS MUST BE REQUESTED FROM ORIG UNLESS ATTACKED, IN WHICH CASE RESPONSE IN KIND ONLY IS AUTHORIZED. THAT IS TO SAY THAT THE LOSS OF AN AIRCRAFT MAY NOT BE RESPONDED TO BY AN ATTACK ON A SHIP. MINIMIZE ESCALATING ACTIONS. • THE FIRST CHALLENGE WILL BE TO ORGANIZE THE COMBINED TASK GROUP INTO AN EFFICIENT FIGHTING UNIT. NOTIFY THIS HEADQUARTERS OF ALL SIGNIFICANT DECISIONS. YOUR PLAN OF OPERATIONS, IN BRIEF, IS OF PRIMARY INTEREST. • TO ENSURE SUSTAINABILITY IN THE EVENT OF A PROTRACTED CAMPAIGN ONLY 36 AIRCRAFT MAY BE AIRBORNE AT ANY GIVEN TIME FROM EACH CARRIER (TOTAL OF 72). THIS DOES NOT INCLUDE LAND BASED P3s OR AWACS AC UNDER THE CONTROL 62 OF THE CARRIER. PERMISSION TO USE THIRD FLEET ASSESTS MUST BE GAINED FROM THIRD FLEET. VIA SEVENTH FLEET. PRIOR TO ISSUING A LAUNCH COMMAND. SUBMIT YOUR PLAN OF ACTION PRIOR TO THE RUN FOR RECORD CONTAINING: 1 BELIEVED ENEMY INTENTIONS: 2 YOUR INTENTIONS: 3 CONTINGENCY PLANS: 63 APPENDIX D VOTAN VOCABULARY FOR NWISS This file is the vocabularies set up for the interactive battle group game in the war lab. COMMON WORDS SET 001 COMMON WRONG ENTER HELP DISPLAY COMMANDS JTO_UNITS NUMBERS AVIATION SHIPS LOAD TASK WORDS SET 002 TASK_WORDS GO_TO_SLEEP LISTEN_TO_ME INITIALIZE VERIFY 003 WRONG,HS = \0B,CM 004 ENTER,HS = \OD,CM 006 HELP,HS = ?,CM DISPLA Y WORDS SET 008 DISPLAY,CM CANCEL, HS = CANCEL\20 CIRCLE,HS = CIRCLE 20 GRID,HS = GRID\20 RADIUS,HS= RADIUS 20 SHIFT,HS = SHIFT\20 DESIGNATE,! IS = DESIGNATE\20 64 XM ARK.HS = XMARK\20 CENTER,HS = CENTER ,20 FORCE,HS = FORCE\20 POSITION.HS = POSITION 20 DROP,HS = DROP 20 ERASE, HS = ERASE, 20 ESM.HS=ESM,20 PLOT.HS = PLOT 20 LINE_OF_BEARING_SONAR.HS= LOB_SONAR 20 LINE_OF_BEARING_ESM.HS = LOB_ESM\20 COMMANDS TO UMTS SET 009 COMMANDS_TO_UNTTS.CM TIME.HS = TIME 20 AIR.HS = AIR 20 RADAR,HS=RADAR\20 EMITTER,HS = EM ITTER 20 ALTITUDE.HS = ALTITUDEl20 BEARING.HS = BEARING\20 POSITION. HS= POSITION 20 BLIP_ON,HS = BLIP ON\20 COURSE. HS = COURSE\ 20 OFF,HS = OFF\20 DESIGNATE. HS= DESIGNATE 20 FRIENDLY,HS= FRIENDLY 20 UNKNOWN. LIS = UNKNOWN 20 EXECUTE, HS= EXECUTE 20 LAUNCH. HS = LAUNCH 20 PERISCOPE.HS = PERISCOPE 20 CHAFF,HS=RBOC 20 SUBMARINE. HS = SUBMARINE 20 HANDOVER.HS = HANDOVER 20 JOIN.HS = JOIN 20 RECOVER,HS= RECOVER 20 SEARCH. HS = SEARCH 20 BEARING.HS = BEARING 20 BACKSPACE,HS = \08 SPACE,HS = \20 TR.ACK.HS = TR.ACK 20 OLD.HS=OLD,20 SONAR.HS = SONAR 20 PLACE A,HS= PLACE 20 ACTIVATE. HS = ACTIVATE 20 SURFACE. HS= SURFACE 20 ESM.HS=ESM 20 SONAR,HS= SONAR 20 BARR1ER,HS= BARRIER 20 FORCE. HS=FORCE\20 TRACK. HS = TRACK' 20 BLIP_OFF,HS = BLIP OFF\20 COVER,HS = COVER 20 DEPTH, HS = DEPTH 20 ENEMV,HS= ENEMY 20 NEUTRAL.HS = NEUTRAL 20 EMCON.HS = EMCON 20 FIRE.HS=FIRE 20 ORDERS. HS = ORDERS 20 PROCEED. HS= PROCEED 20 REFUEL. I IS = REFUEL 20 CEASE. HS = CEASE 20 INFORM, HS= INFORM 20 RECALL. HS= RECALL 20 REPORT.HS= REPORT 20 SILENCE. HS= SILENCE 20 65 TURN,HS = TURN\20 SPACE,HS = \20 SPEED,HS = SPEED\20 TAKE,HS = TAKE\20 ON.HS = ON 20 DECEPTIVE JTOUNTER_\IEASURES.I IS = DECM\20 WEAPONS_FREE,HS = WEAPONS FREE\20 WEAPONS JTIGHT,HS = WEAPONS TIGHT' 20 NUMBERS SET USE,HS=USE\20 BACKSPACE.HS = \08 STATION.HS = STATION\20 ALL,HS = ALL 20 010 NUMBERS.CM ONE,HS-l THREE,HS = 3 FIVE.HS = 5 ■ SEVEN.HS = 7 NINER,HS = 9 POINT.HS=. SOUTH,HS = S\20 WEST,HS«W\20 ALPHA.HS = A CHARLIE, HS = C ECHO,HS = E GOLF,HS = G INDIA,HS=I KILO.HS=K MIKE.HS=M OSCAR,HS = 0 QUEBEC,HS = Q SIERRA,HS = S UNIFORM, I IS = U WHISKEY,HS = W YANKEE.HS = Y SPACE,HS = \20 AVIATION SET TWO,HS-2 FOUR.HS-4 SIX.HS = 6 EIGHT,HS = 8 ZERO,HS = 0 NORTH,HS = N20 EAST,HS=E20 TACK,HS = - BRAVO,HS=B DELTA, I IS =D FOXTROT,HS=F HOTEL, I IS =11 JULLIET,HS = J LIMA,HS=L NOVEMBER,IIS = N PAPA,HS=P ROMEO,HS=R TANGO,IIS = T VICTOR. HS.= V X-RAY.HS-X ZULU,HS = Z BACKSPACE,HS = \08 66 Oil AVIATION.CM ALTITUDE,HS = ALTITUDE\20 BEARING, HS = BEARING 20 POSITION. HS= POSITION 20 BINGO.HS = BINGO\20 COURSE.HS = COURSEi20 FIRE.HS = FIRE'20 LAUNCH, HS= LAUNCH, 20 AEW,HS = AEW 20 ASW.HS = ASW(20 RECONN.HS = RECONN 20 RESCUE,HS= RESCUE 20 STRIKE_CAP.HS = STRCAP 20 SURCAP.HS= SURCAP 20 JAMMER,HS = JAMMER, 20 NONE.HS = NONE\20 SPEED, HS= SPEED \20 PROCEED,HS = PROCEED\20 STOP,HS = STOP\20 FOR,HS = FOR 20 CH46,HS = CH46\20 E3A,HS = E3A 20 EP3E.HS = EP3E 20 FA18.HS=FA1S,20 P3C.HS=P3C 20 LAMPS.HS=SH2F 20 SPACE,HS = \20 BARRIER,HS= BARRIER\20 FORCE.HS= FORCE' 20 TRACK,HS = TRACK\20 TO,HS = TO\20 COVER,HS = COVER' 20 AT,HS = AT\20 MISSION, HS= MISSION 20 AIRTANKER.HS = AIRTANKER 20 DECOY, HS= DECOY 20 RELAY.HS= RELAY 20 SEARCH. IIS = SEARCH 20 SURVEILANCE,HS = SURVEI LANCE 20 CAP.HS = CAP 20 STRIKE.HS = STRIKE 20 REFUEL, HS= REFUEL 20 TAKE,HS=TAKE\20 STATION, HS= STATION\20 A6E,HS = A6E\20 A7E.HS = A7E 20 E2C,HS = E2C\20 EA6B,HS = EA6B 20 F14A,HS=F14A 20 KA6D,HS=KA6D 20 S3A.HS = S3A 20 SH3H.HS = SH3H 20 BACKSPACE.HS = \03 SHIPS SET 012 SHIPS.NS = COMMANDS_TO_UNTTS.CT= 1,CM KITTYHAWK,HS= FOR KITTY 20 FOX.HS=FOX 20 \VILSON.HS= FOR WILSO ,20 SPRUANCE.I1S= FOR SPRUA\20 67 KNOX,HS= FOR KNOX\20 WONSAN,HS= FOR WONSA\20 LOS_ANGLES,HS=FOR LOSAN\20 MISSAWA,HS= FOR MISAW\20 ADAK,HS= FOR ADAK\20 JFK,HS=FORJFK\20 R.K.TURNER,HS= FOR TURNR\20 MAC.HS=FOR MAC\20 FL"RER,HS = FOR FURER\20 IOWA,HS= FOR IOWA\20 LONGBEACH,HS=FOR LONGB\20 IOWA.HS=FOR IOWA 20 GAR,HS=FOR GAR 20 PETRO.HS= FOR PETRO.20 OMAHA,HS= FOR OMAHA\20 JOHN_ROGERS,HS=FOR ROGER\20 RATHBOURNE,HS=FOR RATHB\20 WICHITAU,HS= FOR WICHI\20 ALEKSIUV,HS = FOR ALEKS\20 VLADIVOSTOK'S - FOR VLAD\20 MCCORMICK,HS= FOR MCCOR\20 JOHN_HANCOCK,HS= FOR HANCK\20 LOAD SET {WEAPON SET) 013 LOAD, HS= LOAD 20,CM HARPOOX,HS = HRPON\20 TLAM,HS = TLAM\20 ASROC,HS = ASROC\20 MARK46A,HS=MK46A\20 MARK57,HS=MK57\20 MARK83,HS=MK83\20 76MILLIMETER,HS=MM76\20 ROCKEYE.HS= RKEYE\20 SPARROWS = SPAR ,20 WALLEYE, US = WALLI 20 TASM,HS = TASM 20 APAM,HS = APAM 20 MARK46,IIS=MK46 20 MARK48,HS=MK48\20 MARK82,HS=MK82 20 MARKS4,HS=MK84 20 PHIONEX,HS = PIIENX\20 SHRIKE, IIS = SHRIK 20 SIDEWINDER,HS = S\YDR 20 SM2ER,HS=SM2ER 20 68 ONE,HS=l TW0,HS=2 THREE,HS = 3 F0UR,HS = 4 FIVE,HS = 5 SIX.HS = 6 SEVEN. HS = 7 EIGHTHS = 8 NINER,HS = 9 ZERO.HS = 0 PINGER.HS = SSQ47\20 DIFAR.HS = SSQ53\20 DICASS.HS = SSQ62 20 SPACE. HS = \ 20 BACKSPACE. HS= OS STANDARD_EXTENDED_R.ANGE.HS = STDER 20 STANDARD MEDIUM RANGE. HS = STDMR 20 69 LIST OF REFERENCES Barr, Avron, and Edward A. Feigenbaum, 1981: The Handbook of Artificial Intelligence, Vol. 1, HeurisTech Press, 409 pp. Druzhinin, V. V., and D. S. Kontorov, 1972: Foreword to the Russian Edition of Concept, Algorithm, Decision (A Soviet View), by S. M. Shtemenko (General of the Army. L.S.S.R.). Superintendent of Documents. U.S. Government Printing Office, Catalog No. D30l.79:6, Stock No. 008-070-0034409. Dupuv. Col. T. N.. USAj Ret., 1986: In Search Of An American Philosophv of Command and Control. A Preliminary Draft, Class Notes OS3636, Summer Quarter, 25 pp. Foley, James D.. Victor L. Wallace, and Peggv Chan, 1984: The Human Factors of Computer Graphics Interaction Techniques. IEEE, Computer Graphics and Applications. November, 13-43. Harris. C. L. Lane. P. Shaha. and J. Tombrclla. 1983: NWISS, Naval Warfare Interactive Simulation System I. sers Manual. U arlab Handout. Naval Postgraduate School. 28 November. 17 pp. Kavaler. Robert A.. 1986: The Design and Evaluation of a Speech Recognition System for Engineering Workstations. Ph.D. dissertation, University "of California, Berkeley, 162 pp. Kurzweil, Raymond, 1984: The Coming Age of Intelligent Machines or "What is 'AL Anyway?". Keynote Address The Institute of Electrical and Electronics Engineers {IEEE)' International Conference on Computer Design, 14 pp. Local Command Center Network {LCCN) Statement of Work for Request for Proposal, 23 October 1978. Miller, G. A., 1956: The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97. Murveit. Hv and Donald Bell. 1986: Speech Entry to a Natural- Language- Accessed Data Base. SRI Project 6096, Contract "N0O39-83-K-0442. \tenlo Park, CA: SRI International, 75 pp. Naval Ocean Svstems Center (N'OSCL 1985: Navy Exploratorv Development Program FY 86 Block Plan. Combat Direction, N02C, F5 August. 1-37. Orr, George E.. 1983: Combat Operation C~ T. Fundamentals and Interactions. Maxwell Air Force Base, AL: Air University Press, 99 pp. Pallett, David S.. 1985: Performance Assssment of Automatic Speech Recognizers. Journal of Research of the National Bureau of Standards. 90. 5, 371-387. Poggio. A., J.J. Garcia Luna Aceves, E.J. Craighill. D. Monm, L. Aquilar, D. Worthington. and J. Hight, SRI International. 1985: CCWS: A Computer- Based Multimedia Information System. IEEE, October, 92-103. Poock, Gary K.r1980: Experiments With Voice Input For Command and Control: Using Voice Input To Operate A Distributed Computer Network. Naval Postgraduate School, Monterey, CA, 34 pp. Poock, Garv K., 1981: A Longitudinal Study of Computer Voice Recognition Performance and Vocabulary Size. Naval Postgraduate School. Monterev, CA, 32 pp. Poock. Gary K., 1986a: A Longitudinal Study of Five Year Old Speech Reference Patterns. Journal of the "American Voice Input: Output Society. 3, 13-18. Poock. Gary K., 19S6b: Speech Recognition Research, Applications and International 70 Efforts. Invited Paper for the 1986 Human Factors Society, not yet published. Ryan. Jr.. Thomas A., Joiner, Brian L., and Barbara F. Rvan. 1981: Mini tab Reference Manual. University Park, PA, The Pennsylvania State University, 154 pp. Salfer, D. L., 1985: Voice Automation of Ship Control. Master's Thesis, Naval Postgraduate School. Monterey, CA, September, 59 pp. Strategic Computing Program, 1985: Chapter 2. Integration, Transition, and Performance evaluation of Speech Technology. Draft Copy. December. Tanenbaum, A. S., 1981: Computer Networks. Prentice-Hall. Inc., 517 pp. U.S., Joint Chiefs of Staff. Publication Number 1, 1984: Department of Defense Dictionary of Military and Associated Terms. U.S. Government Pontine Office. 404 pp. VTR 60x0 Series II. 1985: VOTAN Manufacturer's Technical Manual, Guide To Procedures. VOTAN, Fremont, CA, 81 pp. VTR 6050 II. 1985: User's Guide. VOTAN Manufacturer' s Technical Manual, Reference Manual. VOTAN. Fremont. CA, 178 pp. Wohl. Joseph G.. 1981: Force Management Decision Requirements for Air Force Tactical Command and Control. IEEE Transactions on Systems, Man. and Cybernetics. SMC-1 1, 9. 618-639. 71 INITIAL DISTRIBUTION LIST No. Copies 1. Defense Technical Information Center 2 Cameron Station Alexandria, VA 22304-6145 2. Librarv. Code 0142 2 Naval Postgraduate School Monterey, CA 93943-5002 3. Defense Contract Audit Agency 1 Cameron Station Alexandria, VA 22314 4. Garv K. Poock. Code 55Pk 30 Naval Postgraduate School Monterey, CA 93943-5000 5. Mr. Kennv Avila I Lockheed Aircraft Corp. Dent. 1-332 P.O.Box 33 Ontario, CA 91761 6. Dr. Janet Baker 1 Drason Systems Chapel Bridge Park 55 Chapel Street Newton, MA 02158 7. Dr. Sarah Blackstone 1 ASLHA 10801 Rockville Pike Rockville, MD 20852 8. Donald Bell 1 SRI International 333 Ravenwood Menlo Park. CA 94025 9. Hv Murveit 1 SRI International 333 Ravenwood Menlo Park, CA 94025 10. Bill Lee 1 LITTON Svstems M S 2-01 ' 5115 Calvert Rd. College Park, MD 20740 11. Leon Lerman 1 LMSC - 0 86-60. BT53 P.O. Box 3504 Sunnyvale, CA 94088 12. B. Jay J. Maritn 1 Perceptronics 21111 Irwin St. Woodland Hills. CA 91367 73 13. Paul A. Manoione 1 Speech Svstems Incorp. 1§356 Ox'nard St. Tarzana, CA 91356 14. Dr. David Pallett 1 National Bureau of Standards A216 Technology BLDG. 625 Gaithersburg, MD 20899 15. Ralph Pettit 1 Cubic Defense Svstems 9333 Balboa Ave. San Diego. CA 92123 16. LCDR. Michael A. LeFever, USX 1 579B Wilkes Lane Monterey. CA 93940 17. CDR. J. Stewart. Code 55ST 2 Naval Postgraduate School Monterey. CA 93943-5000 IS. C3 Academic Group. Code 74 2 Prof. M. K. Sovereign Naval Posteraduate^School Monterey, CA 93943-5000 19. Naval War College 1 Wargaming Department Sims Hall.TIO? Newport, RI 02S41-5010 20. MAJOR T. J. Brown, USAF 1 Code 39 Naval Postgraduate School Monterey, CA 93943-5000 21. Headquarters Rome Air Development Center AFSC 2 Office of the Chief Scientist Attn: Fl Diamond Griffiss AFB, NY 13441 22. Commander, Code 421 1 Naval Ocean Svstems Center San Diego, CA' 92152 23. USA CFXOM 1 AMSEL-SEI-F Attn: Dr. Israel Mavk Ft. Mammouth, NJ '07703 24. National Defense University 1 Fort Lesley J. McNair Washington, DC 20319 25. Michael J. Zyda. Code 52 1 Naval Postgraduate School Monterey. CA 93943-5000 26. Frederick C. Johnson 1 4141 Jutland Drive San Diego, CA 92117 27. Steve Nunn 1 Commander, Naval Ocean Svstems Command Code 441 San Diego, CA 92152 74 1 7898 i„ 30H00T V0^ >4 AUG 90 t\yeS^ thesis c.l a 0 2 6* 221061 LeeftVfch recognition * « SP 7 and control wot* rtTtfon Environment. 2 ffcB 99 T T 35723 B026P - Thesis L458 c.l 2213S1 LeFever Speech recognition in a command and control work- station environment.